hw: fix WMMA fp16/bf16 RTL output handling by cassuto · Pull Request #359 · vortexgpgpu/vortex

cassuto · 2026-06-03T08:11:59Z

Issue

WMMA fp16->fp16 and bf16->bf16 did not follow fmt_d. The RTL always treated the accumulator input/output path as FP32.

Root Cause

VX_tcu_fedp_bhf was missing destination-format handling. When fmt_d selected fp16 or bf16, the RTL still produced FP32-formatted bits instead of the expected 16-bit fp16/bf16 result in the low halfword.

Proposal

Add fmt_d handling in the BHF TCU datapath so fp16 and bf16 WMMA outputs are rounded and packed in the expected destination format. Extend the SGEMM TCU regression coverage to include fp16->fp16 and bf16->bf16 cases, with ULP checks in the native 16-bit encoding space.
This fix passes the synthesis testing.

Your Name added 4 commits June 3, 2026 16:14

test: cover fp16 bf16 WMMA outputs

cc0ba35

hw: fix WMMA fp16 bf16 outputs

a60a013

ci: add TCU BHF synthesis coverage

a543094

ci: fix TCU BHF synthesis coverage

ce6d361

cassuto force-pushed the fix_wmma_bf16_fp16 branch from ff0fbc4 to ce6d361 Compare June 3, 2026 08:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hw: fix WMMA fp16/bf16 RTL output handling#359

hw: fix WMMA fp16/bf16 RTL output handling#359
cassuto wants to merge 4 commits into
vortexgpgpu:masterfrom
cassuto:fix_wmma_bf16_fp16

cassuto commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cassuto commented Jun 3, 2026

Issue

Root Cause

Proposal

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant